AITopics | top-k operator

Collaborating Authors

top-k operator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Differentiable Fast Top-K Selection for Large-Scale Recommendation

Zhu, Yanjie, Zhang, Zhen, Wang, Yunli, Wang, Zhiqiang, Li, Yu, Zhou, Rufan, Wen, Shiyang, Jiang, Peng, Lin, Chenhao, Yang, Jian

arXiv.org Artificial IntelligenceNov-5-2025

Cascade ranking is a widely adopted paradigm in large-scale information retrieval systems for Top-K item selection. However, the Top-K operator is non-differentiable, hindering end-to-end training. Existing methods include Learning-to-Rank approaches (e.g., LambdaLoss), which optimize ranking metrics like NDCG and suffer from objective misalignment, and differentiable sorting-based methods (e.g., ARF, LCRON), which relax permutation matrices for direct Top-K optimization but introduce gradient conflicts through matrix aggregation. A promising alternative is to directly construct a differentiable approximation of the Top-K selection operator, bypassing the use of soft permutation matrices. However, even state-of-the-art differentiable Top-K operator (e.g., LapSum) require $O(n \log n)$ complexity due to their dependence on sorting for solving the threshold. Thus, we propose DFTopK, a novel differentiable Top-K operator achieving optimal $O(n)$ time complexity. By relaxing normalization constraints, DFTopK admits a closed-form solution and avoids sorting. DFTopK also avoids the gradient conflicts inherent in differentiable sorting-based methods. We evaluate DFTopK on both the public benchmark RecFLow and an industrial system. Experimental results show that DFTopK significantly improves training efficiency while achieving superior performance, which enables us to scale up training samples more efficiently. In the online A/B test, DFTopK yielded a +1.77% revenue lift with the same computational budget compared to the baseline. To the best of our knowledge, this work is the first to introduce differentiable Top-K operators into recommendation systems and the first to achieve theoretically optimal linear-time complexity for Top-K selection. We have open-sourced our implementation to facilitate future research in both academia and industry.

artificial intelligence, machine learning, operator, (16 more...)

arXiv.org Artificial Intelligence

2510.11472

Country: Asia > China (0.48)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Differentiable Top-k with Optimal Transport Y ujia Xie College of Computing Georgia Tech

Neural Information Processing SystemsAug-17-2025, 03:43:39 GMT

Work done in a Google internship.

soft top-k operator, top-k operation, top-k operator, (11 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

A Related Work on Compression Techniques in Distributed Optimization and Learning

Neural Information Processing SystemsAug-15-2025, 08:30:38 GMT

The analysis in this section follows the techniques introduced in [8]. The proof of Proposition 2 follows roughly the same steps as the proof of Proposition 1. In this section, we will compile some results that will prove to be useful later in our analysis. With this in mind, we will assume throughout this section that all clients perform the same number of local updates, i.e., To that end, we will make use of the following lemma. To prove Theorem 5, we will construct an example involving two clients.

convex, equation, inequality, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Differentiable Top-k with Optimal Transport

Neural Information Processing SystemsFeb-7-2025, 20:08:00 GMT

Additional Feedback: Some comments: l. 117 The entropic OT is surely not more computationally friendly than a a top-k operator that simply sorts the vector. Same for the beam-search method, the present work seems to be a sequence of ad-hoc definitions rather than a principled objective. In particular it is important to make the optimization objective clear to enable future comparisons. Can the authors clearly distinguish their contributions from the ones of Cuturi et al, 2019? It seems that the implementation of the authors is potentially faster, which should be highlighted.

differentiable top-k, neurips paper, optimal transport, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.61)
Information Technology > Information Management > Search (0.38)

Add feedback

MoDification: Mixture of Depths Made Easy

Zhang, Chen, Zhong, Meizhi, Wang, Qimeng, Lu, Xuantao, Ye, Zheyu, Lu, Chengqiang, Gao, Yan, Hu, Yao, Chen, Kehai, Zhang, Min, Song, Dawei

arXiv.org Artificial IntelligenceOct-18-2024

Long-context efficiency has recently become a trending topic in serving large language models (LLMs). And mixture of depths (MoD) is proposed as a perfect fit to bring down both latency and memory. In this paper, however, we discover that MoD can barely transform existing LLMs without costly training over an extensive number of tokens. To enable the transformations from any LLMs to MoD ones, we showcase top-k operator in MoD should be promoted to threshold-p operator, and refinement to architecture and data should also be crafted along. All these designs form our method termed MoDification. Through a comprehensive set of experiments covering model scales from 3B to 70B, we exhibit MoDification strikes an excellent balance between efficiency and effectiveness. MoDification can achieve up to ~1.2x speedup in latency and ~1.8x reduction in memory compared to original LLMs especially in long-context applications.

large language model, machine learning, modification, (19 more...)

arXiv.org Artificial Intelligence

2410.14268

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.05)
(17 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Beam Tree Recursive Cells

Chowdhury, Jishnu Ray, Caragea, Cornelia

arXiv.org Artificial IntelligenceJun-20-2023

We propose Beam Tree Recursive Cell (BT-Cell) - a backpropagation-friendly framework to extend Recursive Neural Networks (RvNNs) with beam search for latent structure induction. We further extend this framework by proposing a relaxation of the hard top-k operators in beam search for better propagation of gradient signals. We evaluate our proposed models in different out-of-distribution splits in both synthetic and realistic data. Our experiments show that BTCell achieves near-perfect performance on several challenging structure-sensitive synthetic tasks like ListOps and logical inference while maintaining comparable performance in realistic data against other RvNN-based models. Additionally, we identify a previously unknown failure case for neural models in generalization to unseen number of arguments in ListOps. The code is available at: https://github.com/JRC1995/BeamTreeRecursiveCells.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.19999

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(19 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

Sander, Michael E., Puigcerver, Joan, Djolonga, Josip, Peyré, Gabriel, Blondel, Mathieu

arXiv.org Artificial IntelligenceJun-4-2023

The top-k operator returns a sparse vector, where the non-zero values correspond to the k largest values of the input. Unfortunately, because it is a discontinuous function, it is difficult to incorporate in neural networks trained end-to-end with backpropagation. Recent works have considered differentiable relaxations, based either on regularization or perturbation techniques. However, to date, no approach is fully differentiable and sparse. In this paper, we propose new differentiable and sparse top-k operators. We view the top-k operator as a linear program over the permutahedron, the convex hull of permutations. We then introduce a p-norm regularization term to smooth out the operator, and show that its computation can be reduced to isotonic optimization. Our framework is significantly more general than the existing one and allows for example to express top-k operators that select values in magnitude. On the algorithmic side, in addition to pool adjacent violator (PAV) algorithms, we propose a new GPU/TPU-friendly Dykstra algorithm to solve isotonic optimization problems. We successfully use our operators to prune weights in neural networks, to fine-tune vision transformers, and as a router in sparse mixture of experts.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2302.01425

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification

Garcin, Camille, Servajean, Maximilien, Joly, Alexis, Salmon, Joseph

arXiv.org Machine LearningFeb-4-2022

In modern classification tasks, the number of labels is getting larger and larger, as is the size of the datasets encountered in practice. As the number of classes increases, class ambiguity and class imbalance become more and more problematic to achieve high top-1 accuracy. Meanwhile, Top-K metrics (metrics allowing K guesses) have become popular, especially for performance reporting. Yet, proposing top-K losses tailored for deep learning remains a challenge, both theoretically and practically. In this paper we introduce a stochastic top-K hinge loss inspired by recent developments on top-K calibrated losses. Our proposal is based on the smoothing of the top-K operator building on the flexible "perturbed optimizer" framework. We show that our loss function performs very well in the case of balanced datasets, while benefiting from a significantly lower computational time than the state-of-the-art top-K loss function. In addition, we propose a simple variant of our loss for the imbalanced case. Experiments on a heavy-tailed dataset show that our loss function significantly outperforms other baseline loss functions.

hinge, hinge loss, noised bal, (15 more...)

arXiv.org Machine Learning

2202.02193

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France > Occitanie > Hérault > Montpellier (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Differentiable Top-k Operator with Optimal Transport

Xie, Yujia, Dai, Hanjun, Chen, Minshuo, Dai, Bo, Zhao, Tuo, Zha, Hongyuan, Wei, Wei, Pfister, Tomas

arXiv.org Machine LearningFeb-18-2020

The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.

soft top-k operator, top-k operator, training procedure, (12 more...)

arXiv.org Machine Learning

2002.06504

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)

Add feedback

Filters

Collaborating Authors

top-k operator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ec24a54d62ce57ba93a531b460fa8d18-Paper.pdf

Differentiable Fast Top-K Selection for Large-Scale Recommendation

Differentiable Top-k with Optimal Transport Y ujia Xie College of Computing Georgia Tech

A Related Work on Compression Techniques in Distributed Optimization and Learning

Review for NeurIPS paper: Differentiable Top-k with Optimal Transport

MoDification: Mixture of Depths Made Easy

Beam Tree Recursive Cells

Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification

Differentiable Top-k Operator with Optimal Transport